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MARKOV CHAINS 
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Abstract. Let K be an irreducible and reversible Markov kernel on a 
finite set X. We construct a metric W on the set of probability measures 
on X and show that with respect to this metric, the law of the continu- 
ous time Markov chain evolves as the gradient flow of the entropy. This 
result is a discrete counterpart of the Wasserstein gradient flow interpre- 
tation of the heat flow in R" by Jordan, Kinderlehrer, and Otto (1998). 
The metric W is similar to, but different from, the L^-Wasserstein met- 
ric, and is defined via a discrete variant of the Benamou-Brenier formula. 



1. Introduction 

Since the seminal work of Jordan, Kinderlehrer and Otto [14], it is known 
that the heat flow on M"' is the gradient flow of the Boltzmann-Shannon 
entropy with respect to the L^-Wasserstein metric on the space of prob- 
ability measures on W^. This discovery has been the starting point for 
many developments in evolution equations, probability theory and geom- 
etry. We refer to the monographs [1, 27, 28] for an overview. By now a 
similar interpretation of the heat flow has been established in a wide variety 
of settings, including Riemannian manifolds [10], Hilbert spaces [2], Wiener 
spaces [11], Finsler spaces [19], Alexandrov spaces [13] and metric measure 
spaces [12, 25]. 

Let {K{x, y))x,yex be an irreducible and reversible Markov transition ker- 
nel on a finite set X, and consider the continuous time semigroup {H{t))t>o 
associated with K. This semigroup is defined by H{t) = e*(^-^), and can be 
interpreted as the 'heat semigroup' on X with respect to the geometry deter- 
mined by the Markov kernel K. Therefore it seems natural to ask whether 
the heat flow can also be identified as the gradient flow of an entropy func- 
tional with respect to some metric on the space of probability densities on 
X. Unfortunately, it is easily seen that the L^- Wasserstein metric over a 
discrete space is not appropriate for this purpose. In fact, since the metric 
derivative of the heat flow in the Wasserstein metric is typically infinite in 
a discrete setting, the heat flow can not be interpreted as the gradient flow 
of any functional in the L^-Wasserstein metric. (We refer to Section 2 for a 
more detailed discussion.) 
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The main contribution of this paper is the construction of a metric W on 
the space of probabihty densities on X, which allows to extend the interpre- 
tation of the heat flow as the gradient flow of the entropy to the setting of 
finite Markov chains. 

Notation. As before, let K : X x X ^ W he a Markov kernel on a finite 
space X, i.e., 

K{x,y)>0 Vx,yGAf, '^K{x,y) = l Vx e X . 

y&X 

We assume that K is irreducible, which implies the existence of a unique 
steady state vr. Thus vr is a probability measure on X , represented by a row 
vector that is invariant under right-multiplication by K: 

T^{y) = ^ T^{x)K{x,y) . 

It follows from elementary Markov chain theory that vr is strictly positive. 
We shall assume that K is reversible, i.e., T^{x)K{x,y) = Tr{y)K{y,x) for 
any x,y G X. Consider the set 

^{X) := I p : ^ M I p{x) > ^x e X ■ tt{x)p{x) = 1 } 

consisting of all probability densities on X. The subset consisting of those 
probability densities that are strictly positive is denoted by ^:^{X). The 
relative entropy of a probability density p G S^{X) with respect to tt is 
defined by 

"^(p) = X] ^(^)/'(^) log P(^) ■ (1-1) 

with the usual convention that p(x) log p(x) = if p(x) = 0. 

Wasserstein-like metrics in a discrete setting. To motivate the defi- 
nition of the metric W, recall that for probability densities po, pi on R", the 
Benamou-Brenier formula [3] asserts that the squared Wasserstein distance 
W2 satisfies the identity 

W2{po,Pif = inf{['[ \VMx)\^ Pt{x) dx dt] , (1.2) 
P'i' I Jo Jr" ) 

where the infimum runs over sufficiently regular curves p : [0, 1] — t- ^(M") 
and : [0, 1] x — M satisfying the continuity equation 

r atp + v-(pvv) = o, , . 

\ p(0) = PO , p{l) = PI . ^'-^^ 

Here, by a slight abuse of notation, =f^(R") denotes the set of probability 
densities on M". At least formally, the Benamou-Brenier formula has been 
interpreted by Otto [23] as a Riemannian metric on the space of probability 
densities on M". 

In the discrete setting, we shall define a class of pseudo- metrics W (i.e., 
metrics which possibly attain the value -|-oo) by mimicking the formulas 
(1.2) and (1.3). 
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In order to obtain a metric with the desired properties, it turns out to be 
necessary to define, for p G l3^{X) and x,y ^ X, 

p{x,y) := e{p{x),p{y)) , 

where 9 : M-j. x IR+ — ;> M+ is a function satisfying (Al) - (A7) below. At 
this stage we remark that typical examples of admissible functions are the 
logarithmic mean 0(s,t) = s^~^tP dp, the geometric mean 9{s,t) = y/si 
and, more generally, the functions 6{s,t) = s°'t°' for a > 0. 
Now we are ready to state the definition of W: 

Definition. For po,pi E J^{X) we set 

W(po,Pi)':=mf|^ f Yl {M^)-My)?K{x,y)pt{x,y)7r{x)dt\ , 

where the infimum runs over all piecewise curves p : [0, 1] — ?• ^{X) and 
all measurable functions ip : [0, 1] — )■ M'^ satisfying, for a.e. t E [0, 1], 

d 



■Pt{x) + "^iiptiy) - i^t{x))K{x,y)pt{x,y) = Mx ^ X , 
/9(0) = po , = Pi • 



Remark. Similar to the Wasserstein metric, Vy(pO)Pi)^ can be interpreted 
as the cost of transporting mass from its initial configuration po to the 
final configuration pi. However, unlike the Wasserstein metric, the cost of 
transporting a unit mass from x to y depends on the amount of mass already 
present at x and y. In a continuous setting, metrics with these properties 
have been studied in the recent papers [6, 9]. The essential new feature 
of the metric considered in this paper is the fact that the dependence is 
non-local. 

In order to state the first main result of the paper, we introduce some 
notation. Fix a probability density p E ^{X). We shall write x ~p y 
ii x,y £ X belong to the same connected component of the support of p. 
More formally, we say that x ~p y if x = y, or if there exist k > 1 and 
xi, . . . ,Xk £ X such that 

p{x,xi)K{x,xi),p{xi,X2)K{xi,X2), ... ,p{xk,y)K{xk,y)>0. 
Furthermore, we set 

1 1 



Ce:= = dr E [0, oo] . 

Jo V^(l-r,l+r) ^ ^ 

It turns out that Cq is the W-distance between a Dirac mass and the uniform 
density on a two-point space {a, b} endowed with the Markov kernel defined 
by K{a,b) = K{b,a) = ^. Note that Cg is finite if 9 is the logarithmic or 
geometric mean, li 9{s,t) = s^^t"', then Cg is finite for < a < 2 and infinite 
for a > 2. 

For fj E ^{X) we shall write 

J^a{X) := {p E J^{X) : W{p,a) < oo} . 

The first main result of this paper reads as follows: 
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Theorem 1.1. The following assertions hold: 

(1) yV defines a pseudo-metric on ^{X). 

(2) • If Ce < oo, then >V(po,Pi) < oo for all po,pi G ^{X). 

• If Ce = oo, the following are equivalent for po,pi G ^{X): 

(a) >V(po,Pi) < oo ; 

(b ) For all X £ X we have 

(3) For all a G ,^[X), W metrizes the topology of weak convergence on 

(4) • If Cg < oo and 6 is concave, the metric space W) is a 

Riemannian manifold. 

• If Cg = oo, the metric space [l^(j{X),W) is a complete Rie- 
mannian manifold for all a G l3^{X). 

Remark (Finiteness) . Part (2) of the theorem above provides a complete 
characterisation of finiteness of W for general Markov kernels, in terms of the 
behaviour of W for kernels on a two-point space. If Ce = oo, the statement 
can be rephrased informally by saying that the distance W(pO) Pi) is finite if 
and only if the following conditions hold: po and pi have equal support, and 
both measures assign the same mass to each connected component of their 
support. In particular, it is important to note that the distance between 
two strictly positive densities is finite. 

Remark (Weak convergence). Although (3) asserts that W metrizes the 
topology of weak convergence on ,'^„{X) for every a G ^{X), it follows 
from (2) that W does not metrize this topology on the full space ^{X) 
if Cg = oo. In fact, a weakly convergent sequence in ^u(X) converges in 
W- metric if and only if the weak limit belongs to ^„{X). 

Remark (Non-compactness). If Cg = oo, we hasten to point out that the 
Riemannian manifold (W, ^aiX)) can be a singleton. According to (2), 
this happens if and only if K{x,y)a{x,y) = for every x G supper and 
every y € X, which is for instance the case if a is the density of a Dirac 
measure. If ^^(X) consists of more than one element, it turns out that 
{^a{X),W) is non-compact. By contrast, the L^-Wasser stein space over a 
compact metric space is compact. 

Remark (Riemannian metric). The Riemannian metric on {^^[X),W) is a 
natural discrete analogue of the formal Riemannian metric on the Wasser- 
stein space over M". In fact, consider a smooth curve (/Ot)te[o,i] i'^ 3^*{X) 
and take t G [0, 1]. In Section 3 we shall prove that there exists a unique 
discrete gradient Vifjt = {i^tix) — '4't{y))x,y&." such that the continuity equa- 
tion (1.4) holds. In view of this observation, we shall identify the tangent 
space at /9 G ^^^{X) with the collection of discrete gradients 

Tp := {W G M'^'''^ : ^p G R^} . 

We shall regard the discrete gradient 'Vipt as being the tangent vector along 
the curve t ^ pt. The distance W is the Riemannian distance induced by 
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the inner product (•, ■) p on Tp given by 

(V^j, V'4))p = {'^{x) - y^{y)){ip{x) - ip{y))K{x, y)p{x, y)'K{x) . 

This formula is analogous to the corresponding expression in the continuous 
case [23]. In Section 3 we obtain a similar description of the Riemannian 
metric on each of the components of ^{X). If p is not strictly positive, the 
tangent space shall be identified with the collection of discrete gradients of 
an appropriate subset of functions on X. 

Remark (Two-point space) . If X is a reversible Markov kernel on a space X 
consists of only two points, it is possible to obtain an explicit formula for 
the metric W. We refer to Section 2 for an extensive discussion. 

Example. If = oo, it follows from Theorem 1.1 that the incidence graph 
associated with the Markov kernel K determines the topology of {£P(X),yV). 
Let us illustrate this fact by two simple examples on a three-point space 

X = {xi,X2,X3}. 

If K(xi,Xj) > for all i ^ j, then the space 3^{X) consists of 7 distinct 
Riemannian manifolds: 

• one 2-dimensional manifold: 

• three 1-dimensional manifolds: for i = 1,2,3, 

Ci:={pe^{X) : p(x,)=Oiff j = f}. 

• three singletons: for i = 1, 2, 3, 

A := {p G ^{X) : p{xj) = iff j ^ i} . 

If iir(a;i, X2), i^(x2, a^s) > and K{xi.,x-i) = 0, then the space ^{X) 
consists of infinitely many distinct Riemannian manifolds: 

• one 2-dimensional manifold: ,^^{X); 

• two 1-dimensional manifolds: Ci and C3; 

• infinitely many singletons: the three singletons Di for i = 1,2,3, and 
the infinite collection 

{{p} : p{xi) > 0, p{x3) > 0, p{x2) = 0} . 

The gradient flow of the entropy. Since the entropy functional H re- 
stricts to a smooth functional on the Riemannian manifold {^^:(X),yV), it 
makes sense to consider the associated gradient flow. Let Dtp denote the 
tangent vector field along a smooth curve p : (0, 00) — >• ^^{X) and let grad if 
denote the gradient of a smooth functional (p : 3^.^{X) — )■ M. 

Consider the continuous time Markov semigroup H{t) = e^^^~^\ i > 0, 
associated with K. It follows from the theory of Markov chains that H{t) 
maps 3^{X) into The second main result of this paper asserts that 

the 'heat flow' determined by H[t) is the gradient flow of the entropy % 
with respect to W, if 9 is the logarithmic mean. 

Theorem 1.2 (Heat flow is gradient fiow of entropy). Let 6 be the logarith- 
mic mean. For p € I^{X) and t > 0, set pt := e^^^~^^ p. Then the gradient 
flow equation 

Dtp = -gradnipt) 
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holds for all t > 0. 

Remark. The choice of the logarithmic mean is essential in Theorem 1.2 if 
one wishes to identify the heat flow as the gradient flow of the entropy asso- 
ciated with the function f{p) = plog p. In section 4 we prove that analogous 
results can be proved for certain different functions /, if one replaces the 
logarithmic mean by 6{s,t) = jij^E^rjfj- The appearance of the logarithmic 
mean in discrete heat flow problems is not surprising. In fact, the "Log 
Mean Temperature Difference", usually called LMTD, plays an important 
role in the engineering literature on heat and mass transfer problems (see, 
e.g., [18]), in particular in heat flow through long cylinders (see also [4, 
Section 4.5] for a discussion). 

Remark. For Markov chains on a two-point space {—1, 1} we shall show 
in Section 2 that (under mild additional assumptions) the metric W is the 
unique metric for which the gradient flow of the entropy coincides with the 
heat flow. We refer to Proposition 2.13 below for a precise statement. 

Ricci curvature in a discrete setting. A synthetic theory of Ricci cur- 
vature in metric measure spaces has been developed recently by Lott-Sturm- 
Villani [17, 26]. These authors defined lower bounds on the Ricci curvature 
of a geodesic metric measure space in terms of convexity properties of the 
entropy functional along geodesies in the L^-Wasserstein metric. For long 
there has been interest to define and study a notion of Ricci curvature on 
discrete spaces, but unfortunately the Lott-Sturm-Villani definition cannot 
be applied directly. The reason is that geodesies in the L^-Wasserstein space 
do typically not exist if the underlying metric space is discrete, even in the 
simplest possible example of the two-point space (see Section 2 below for 
more details). 

The metric W constructed in this paper does not have this defect. By a 
lower-semicontinuity argument it can be shown that every pair of probability 
densities in ^{X) can be joined by a constant speed geodesic. Since W takes 
over the role of the L^-Wasserstein metric if 9 is the logarithmic mean, the 
following modification of the Lott-Sturm-Villani definition of Ricci curvature 
seems natural: 

Definition 1.3 (Ricci curvature lower bound). Let K = (^K[x,y))x ,y€X be 
an irreducible and reversible Markov kernel on a finite space X . Then K 
is said to have Ricci curvature bounded from below by k G M, if for every 
Po,Pi S ^{X) there exists a constant speed geodesic {pt)t£[o,i] i^i'^):'^) 
satisfying po = Po, Pi = Pi, and 

nPt) < (1 - mipo) + tn{pi) - ^t(l - t)W(po, pi)' 

for all t E [0, 1] . We set 

Ric(i^) := sup{k G M : has Ricci curvature bounded from below by k.} 

Calculating or estimating Ric(ir) in concrete situations does not appear 
to be an easy task. We shall address this topic in a forthcoming publication. 

Several other approaches to Ricci curvature in a discrete setting have been 
considered recently. 
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Bonciocat and Sturm [5] adapted the definition based on displacement 
convexity of the entropy from [17, 26] to the discrete setting. The non- 
existence of geodesies in the L^-Wasser stein space is circumvented by con- 
sidering approximate midpoints between measures in the L^-Wasserstein 
metric. Using this approach it is shown that certain planar graphs have 
non-negative Ricci curvature. 

Ollivier [20, 21] defined a notion of Ricci curvature by comparing trans- 
portation distances between small balls and their centers. This notion co- 
incides with the usual notion of Ricci curvature lower boundedness on Rie- 
mannian manifolds and is very well adapted to study Ricci curvature on 
discrete spaces. In particular, it is easy to show that the Ricci curvature of 
the n-dimensional discrete hypercube is proportional to ^. However, as has 
been discussed in [22], the relation with displacement convexity remains to 
be clarified. 

Very recently Y. Lin and S.-T. Yau [16] studied Ricci curvature on graphs 
by taking a characterisation in terms of the heat semigroup due to Bakry 
and Emery as a definition. With this definition it is shown that the Ricci 
curvature on locally finite graphs is bounded from below by —1. 

Structure of the paper. Section 2 contains a detailed analysis of the 
metric W associated with Markov kernels on a two-point space. In section 
3 we study the metric W in a general setting and prove Theorem 1.1. In 
Section 4 we study gradient fiows and present the proof of Theorem 1.2. 

Note added. After completion of this paper, the author has been informed 
about the recent preprint [7] where a related class of a metrics has been 
studied independently. The results obtained in both papers are largely com- 
plementary. 

Acknowledgement. The author is grateful to Matthias Erbar, Nicola Gigli, Nico- 
las Juillct, Giuseppe Savare, and Karl-Theodor Sturm for stimulating discussions 
on this paper and related topics. 

2. Analysis on the two-point space 

In this section we shall carry out a detailed analysis of the metric W 
in the simplest case of interest, where the underlying space is a two-point 
space, say X = Q} = {a, 6}. The reason for discussing the two-point space 
separately is twofold. Firstly, it is possible to perform explicit calculations, 
which lead to simple proofs and more precise results than in the general 
case. Secondly, some of the results obtained in this section shall be used 
in Section 3, where results for more general Markov chains are obtained by 
comparison arguments involving Markov chains on a two-point space. 

Markov chains on the two-point space. Consider a Markov kernel K 
with transition probabilities 



A'(a, b) = p , K{b, a) = q 



(2.1) 
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for somep, q G (0, 1]. Then the associated continuous time semigroup H(t) 

^t{K-I) jg gjygjj 

Hit) = 

p + q 

and the stationary distribution tt satisfies 

q 



( 


q p 




p -p 




.IP. 


-q q 



7r(a) 



7r(6) 



p+q P+Q 

Since K{a^h)iT{a) = K{b,a)'K{b), we observe that K is reversible. Every 
probabihty measure on is of the form - /3)6a + (1 + P)6b) for some 
P G [—1, 1]. The corresponding density with respect to vr is then given 
by 

/3.x p + ql-P p p + ql + 13 

pP{a) := — — , pP{b) :-- 



q 2 

It follows that H{t)p^ = p^* where 

p-q 



p + q 

thus (3 solves the differential equation 

k = p{l - (3t) 



p 



-(p+q)t 



(2.2) 



(2.3) 



Remark 2.1 (Limitations of the L^-Wasserstein distance). Before introduc- 
ing a new class of (pseudo)-metrics on ^(Q^), we shall argue why the L?'- 
Wasserstein metric W2 is not appropriate for the purposes of this paper. 
First we shall show that - as we already mentioned in the introduction - 
the metric derivative of the heat flow is infinite with respect to the Lp'- 

}, and let u{t) := 

Since W2{p",p^ 



Wasserstein metric. To see this, take (3 G [—1,1] \ l^:^ 



H{t)p^ = p^* be the heat flow starting at p' 
for a, 13 G [—1, 1], we have 



a\ 



\u\{t) :- 



lim sup 



W2{u{t),u{s)) 



\/21im 



sup 



p-q 



p + q 



lim sup 



+00 . 



In particular, the heat flow is not a curve of maximal slope (see, e.g., [1] for 
this concept of gradient flow) for any functional on ^(Q^). 

Furthermore, the Lott-Sturm-Villani definition of Ricci curvature [17, 26] 
cannot be applied in the discrete setting, since W2-geodesics between distinct 
elements of <^(Qi) do not exist. To see this, let 0<t<i be a constant 

speed geodesic in ^{Q}). For s,t G [0, 1] we then have 

^2\m-m\ = w2{fJ'^'\p^^'^) 

= \t- s|W^2(p''(°\/(')) = \t- s|V2|/3(0)-/3(l)| , 

which implies that t 1— ?• (3{t) is 2-Holder, hence constant on [0,1]. It thus 
follows that all constant speed VF2-geodesics are constant. 



ENTROPY GRADIENT FLOWS FOR MARKOV CHAINS 9 

A new metric. Given a fixed Markov cliain K on {a, b} we shall define a 
(pseudo-)metric W on ^{{a,b}) that depends on the choice of a function 
9 : M_|_ X M_|_ — )■ ]R_|_. The following assumptions will be in force throughout 
this section: 

Assumption 2.2. The function 9 : [0, oo) x [0, oo) — )• [0, oo) has the follow- 
ing properties: 

(Al) 9 is continuous on [0,oo) x [0,oo); 

(A2) 9 is continuously difjerentiable on (0,oo) x (0,oo); 

(A3) 9{s, t) = 9{t, s) for s, t > 0; 

(A4) 9{s,t) >Ofors,t> 0. 

The most interesting choice for the purposes of this paper is the case 
where 9 is the logarithmic mean defined by 9{s,t) := s^'^p dp. 
To simplify notation we define, for /3 G [—1, 1], 

m = 0{p^ia),p('{b)). 

On the two-point space the variational definition of W given in the intro- 
duction can be simplified as follows: 

Lemma 2.3. For a,/3 G [—1, 1] we have 

"'(''"•'''''=f{^r?|)Wo)*}. (2.4) 

where the inflmum runs over all piecewise -functions /? : [0, 1] ^ [-1, 1]. 
Proof. Substituting xit) = i^tib) — V't(o) in the definition of W, one obtains 



w(p",/)2 = inf ^ / m)xUt 







where the infimum runs over all piecewise C^-functions /? : [0, 1] — )■ [—1, 1] 
and all measurable functions x : [0, 1] — )• K satisfying /3o = a, /3i = /3 and 

p + q 

The result follows by inserting the latter constraint in the expression for 
W(/>",/). □ 

Lemma 2.3 provides a representation of yV{p'^,p^) in terms of a one- 
dimensional variational problem. Note that some care needs to be taken 
when solving this problem, since for some choices of 9 (including the loga- 
rithmic mean) the denominator in (2.4) tends to as /3t tends to ±1. The 
following result provides an explicit formula for W: 

Theorem 2.4. For —l<a<l3<lwe have 

Mp'-y) = l\fl + ^ r dr G [0,oo] . 

Proof. Suppose first that a and /3 belong to (—1, 1). (If p \s bounded away 
from 0, this distinction is not necessary.) It is easily checked that the infi- 
mum in (2.4) may be restricted to monotone functions /3. Since g : r ^ 
is bounded on compact intervals in (—1, 1), (2.4) reduces to an elementary 
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one-dimensional variational problem, which admits a minimizer, say ^, that 
solves the Euler-Lagrange equation 

265(6) + 4Vfe) = 0. 



This equation implies that t i— S,t\/g{^t) is constant, say equal to C. Since 
a < /3, it follows that C > 0. We infer that 



^PQ Jo P(6) 4pg 
Moreover, is monotone, hence invertible. It follows from the inverse func- 
tion theorem that its inverse 7 : [a, (3] [0,1] satisfies 7'(r) = C'^y^gir). 
We thus obtain 

/3 



1 = ^(/3) - ^(a) = / 7'(r) dr = C^^ / dr , 

hence 



C 



which implies the desired identity. 

The general case — I<a</3<1 follows from a straightforward conti- 
nuity argument. □ 

For /3 G [—1, 1] it will be useful to define 



1/1 1 1 
m--=i;\- + - -=dre[-oo,oo], (2.5) 

2 V p q Jo a/pH 

so that Theorem 2.4 implies that 

W(p",/) = |v^(a)-(^(/3)| 

for a,/3 £ [—1,1]. It follows from the assumption on 6 that cp is real- 
valued, continuous and strictly increasing on (—1, 1). Moreover, ip{H) = 
lim^_j.-l-i (/7(/3) is possibly ±00, depending on the behaviour of 6 near 0. 

In order to avoid having to distinguish between several cases in the results 
below, we set 

(-1, 1). = {/3 G [-1, 1] : < 00} , / = MP) : /? G (-1, 1)4 , 

and 

^i(Q'):={/G^(Q'):/3G(-l,l)*}. 

It follows from the remarks above that (—1,1) C (—1,1),, C [—1,1] and 
that / is a (possibly infinite) closed interval in M. The following result, which 
summarises this discussion, is now obvious: 

Proposition 2.5. The function W defines a pseudo-metric on i^{Q}) that 
restricts to a metric on J^i{Q^). The mapping 

defines an isometry from (^i(Q^),yV) onto I endowed with the euclidean 
metric. In particular, (^i(Q"^),yV) is complete. 

The most interesting case for the purposes of this paper is the following: 



ENTROPY GRADIENT FLOWS FOR MARKOV CHAINS 11 

Example 2.6 (Logarithmic mean). If 9 is the logarithmic mean, i.e., 6{s,t) = 
Jq s^^'^t^ dr, then p{—l) = p{l) = and for /3 G (—1, 1) we have 

-(B) = P + g g(l + /?)-p(l-/?) 

^^^^ 2pq log(7(l + /3)-logp(l-/3) ■ 

In this case we have (—1,1)* = [—1,1] and / = [ip{—l), ip{l)] is a compact 
interval. Furthermore, for —l<ct<P<l, 



„ ^ 1 /logg(l + r)-logp(l-r) 



V2Ja ^ q{l + r) - p{l - r) 
If moreover p = q, we have 

/3 



p(/3) 



arctanh /3 
and 



„ g I I arctanh r 
W(p",p^) = ^ / W dr 



2p 

Recall that a constant speed geodesic in a metric space (M, d) is a curve 
u : [0,1] ^ M satisfying 

d{u{s),u{t)) = \t- s\d{u{0),u{l)) 

for all s,te [0,1]. 

The next result gives a characterisation of W-geodesics in ^i(Q^). 

Proposition 2.7 (Characterisation of geodesies). Letp,aG S^\[Q}\ There 
exists a unique constant speed geodesic {/o'^^*^}o<t<i in ^i(Q^) with p'^^^^ = p 
and Moreover, the function 7 belongs to C^([0, l];ffi) and satisfies 

the differential equation 



7'(t) = 2w U^p{l{t)) (2.6) 
/or t G [0, 1], where w := sgn(/3 — a)W(/9", /o'^). 

Proof. Since the mapping J is an isometry from ^i(Q^) onto /, existence 
and uniqueness of geodesies follow directly from the corresponding facts in 

Take now a,/3 £ (-1, 1)* and let 7 G C^{[0, 1];R) be the solution to (2.6) 
with initial condition 7(0) = a. For 0<s<t<lwe then obtain by (2.5), 

V^ilit)) - filis)) = I ip' {'-i{r))-f' {r) dr = w{t - s) , 



which implies that W{p'^^*\ p'^^^^) = \w\{t — s) and 7(1) = /3, hence 1 1— )• p'^'W 
is a constant speed geodesic between />" and □ 
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Gradient flows. In order to identify the heat flow as a gradient flow in 
^{Q^), we make the following assumption: 

Assumption 2.8. In addition to (Al) - (A4) we assume that there exists 
a function f G C([0,oo);]R) n C2((0, oo); M) satisfying f"{t) > for t > 0, 
and 

for all s,t > with s ^ t. 

Example 2.9. Note that this assumption is satisfied in Example 2.6 with 
f{t) = tlogt. 

Consider the functional J- : 3^{Q}^ — )■ M defined by 

Hp) ■■= E fipi^)M^) 

where / : M-|_ — t- M has been defined above. It thus follows that 

Tip^) := -^f{pP{a)) + -^fip^b)) . (2.8) 
p + q p + q 

Proposition 2.5 implies that {^i{Q^),W) is a complete 1-dimensional 
Riemannian manifold, which has a boundary if and only if (—1, 1) is a proper 
subset of (—1, 1)^,. In particular, it makes sense to study gradient flows in 

^i(Q^),w). 



Proposition 2.10 (Heat flow is the gradient flow of the entropy). For 

P G let u : t ^ p^^ = H{t)p^ be the heat flow trajectory starting 

from p^ . Then u is a gradient flow trajectory of the functional T in the 
Riemannian manifold {^i{Q^),yV). 

Proof. Recall that the function J : p^ ^ fi^) maps ^i{Q^) isometrically 
onto a closed interval ICR. Therefore it suffices to show that the gradient 
flow equation 

^^(A) = -T'im)) (2.9) 

holds for t > 0, where J- := J- o J^^. 
To prove this, we set 



£(/3):=/(a), r(/3):=/(6) 
for brevity. Using (2.5) and (2.7) we obtain 



Since 

= HJ{p^)) = hp^) = -^fm) + ^/(r(/3)) , 

p+q P+q 
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it follows that T is continuously differentiable on / and 

/'(r(/3)) -/'(£(/?)) 



-^(r(/3) (/'(r(/3))-/W))) 



On the other hand, (2.3) and (2.10) imply that 
^(/p(/30 = (p(l-/3i)-g(l + /3t))<^'(/3t) 
^ (r(A)-^(A))(^'(A) 



2^2 



-^(r(/3t) -£(A))(/'(r(A)) - /'(^(A))) • 



Combining the latter two identities we obtain (2.9), which completes the 
proof. □ 

In order to investigate the convexity of J- along W-geodesics, we consider 
the function : (— 1, 1) — t- M defined by 

m) := ^ + ^p(/5) (?/"(/(&)) ^vr\p\a))) 

and 

At:=inf{K(/3) (-1,1)} . (2.11) 

Since /" > 0, it follows that k > 2±2. 

Remark 2.11. If f{p) = plogp, straightforward calculus shows that 
K{f3) = P±l+ ^ g(l + /3)-p(l-/3) 



2 l-/32log^(l + /3)-logp(l-/3) 
If moreover p = q, one has 

It turns out that k determines the convexity of the functional J^: 

Proposition 2.12 (Convexity of along W-geodesics). Let k be defined 
by (2.11). The functional T is n-convex along geodesies. More explicitly, 
let po,Pi G 3^\{Q}) and let {pt}o<t<i be the unique constant speed geodesic 
satisfying po = po and p\ = pi . Then the inequality 

HPt) < (1 - t)TiPo) + tT{pi) - |t(l - t)W\po,Pi) 

holds for all t £ [0, 1]. 

Proof. Let a,/3 G (—1, 1)* be such that po = p°' and pi = p^ and set w : = 
W(p°,p^). Without loss of generality we assume that a < /3. Proposition 
2.7 implies that pt = p"'^^\ where 7 satisfies (2.6). 

Set C(t) := Tipt). It suffices to show that (' {t) > w^k for t G [0, 1]. By 
(2.8) we have 

C'{t) = \i{t){f'{p^^'\b))-np'^'\a))) , 
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and therefore (2.6) implies that 




Differentiating this identity and using (2.6) once more, we obtain 



C"{t) = w^K{j{t)) > W^K , 



which completes the proof. 



□ 



The question arises whether the metric W constructed above is the unique 
geodesic metric on £^(Q^) for which the heat flow is the gradient flow of the 
entropy. The answer is affirmative, provided that one requires that the left 
part {p^ : ^ < ^} and the right part : /3 > /3} of J^i{Q^) are patched 
together in a 'reasonable' way. Here /3 := so that corresponds to 
equilibrium. Such a condition is necessary, since the heat flow starting at 
p^ with P > j3 does not 'see' the measures p°' with a < /3, and vice versa. 

A precise uniqueness statement is given below. Since we shall not use this 
result elsewhere in the paper, we postpone its technical proof to Appendix 
B, where the notions of 2- absolute continuity and EVIo(-7^) are defined as 
well. 

Proposition 2.13 (Uniqueness of the metric). Let Ai he a geodesic metric 
on ^i(Q^) with the following properties: 

(1) For P e (-1,1)*, the heat flow t H> p'^<^ given by (2.2), is a 2- 
absolutely continuous curve satisfying EVIo(J^). 

(2) For a,/3 £ (—1, 1)* with a < (3 < /3, we have 



Then M = W. 

Note that (1) and (2) of Proposition 2.13 are satisfied if = W. Indeed, 
since J- is convex by Proposition 2.12, (1) follows from [28, Proposition 
23.1]. Furthermore (2) follows from the explicit expression of W obtained 
in Theorem 2.4. 

3. A Wasserstein-like metric for Markov chains 

In this section we consider a Markov kernel K = {K{x, y))x,y^x on a finite 
state space X. We assume that K is irreducible, and denote its unique steady 
state by tt. For all x G we then have 'k{x) > 0. We also assume that K 
is reversible, or equivalently, that the detailed balance equations 



hold for all x,y £ X. 

Definition of the (pseudo-) metric. We start with the definition of a 
class of Wasserstein-like pseudo-metrics on ^{X). As in Section 2, the 
metric depends on the choice of a function 6 : x ]R+ — t- IR_|_, which we fix 
from now on. To simplify notation, we set 



Mip'^,p^)=Mip'^,p^)+Mip^,p^) . 



K{x,y)TT{x) = K{y,x)7r{y) 



(3.1) 



p{x,y) := e{p{x),p{y)) 



for p G ^{X) and x,y e X. 
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Assumption 3.1. Throughout this section we shall assume that 9 satisfies 
Assumption 2.2. In addition we impose the following assumptions: 

(A5) (Zero at the boundary): 0{0,t) = for all t>0. 
(A6) (Monotonicity): e{r,t) < e{s,t) for all < r < s and t > 0. 
(A7) (Doubling property): for any T > there exists a constant > 
such that 

e{2s,2t) < 2Cd9{s,t) 
whenever < s,t < T . 

Remark 3.2. Actually, the additional assumptions (A5) - (A7) shall not be 
used until Theorem 3.12. 

At some places, in particular in Lemmas 3.14 and 3.16 below, it is pos- 
sible to obtain sharper results by imposing one or both of the following 
assumptions as well. Note that (A7') implies (A7). 

(A7') (Positive homogeneity): 9{Xs,Xt) = X9{s,t) for A > and s,t>0. 

(A8) (Concavity): the function 6 : x — is concave. 

Observe that (A7') and (A8) hold if 9 is the logarithmic mean. 
Definition 3.3 (of the pseudo-metric W). For po,pi G ^{X) we define 

W{po,pif ■.= mi[\ [' V {Mx)-My)?K{x,y)pt{x,y)7r{x)dt : 
{p,iIj) G C£i{pQ,pi] 

where, forT > 0, CSxipo, Pi) denotes the collection of pairs {p,ijj) satisfying 
the following conditions: 

(i) p : [0, T] — M"^ is piecewise ; 

{a) po = po , Pi=Pi; 

{Hi) pt G 3^{X) for all t G [0, T] ; 

(iv) ip : [0, T] M'^ is measurable ; (3.2) 

(v) For all X & X and a.e. t £ (0,T) we have 

Pt{x) + ^{ipt{y) - Mx))K{x,y)pt{x,y) = . 

The latter equation may be thought of as a 'continuity equation'. For 
simplicity we shall often write 

C£{po,pi) := C£i{po,pi) . 

Remark 3.4 (Matrix reformulation). It will be very useful to reformulate 
Definition 3.3 in terms of matrices. For p G ^(A') consider the matrices 
A{p) and B{p) in M-^^-^ defined by 

T,z^:,K{x,z)p{x,z)tt{x) , x = y, 
-K{x, y)p{x, y)TT{x) , x / y , 



Ax,y{p) ■■ 

and 



( ) / T.z^x K{x, z)pix, z) , x = y , 
''^ 1 -K{x,y)p{x,y) , x^y. 
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Definition 3.3 can then be rewritten as 

-1 

\2 



W(po,Pi) =inf [A{pt)^tM dt : (p, V') S C^(po, , (3.3) 

and the 'continuity equation' in (3.2) reads as 

pt = B{pt)iJt . (3.4) 

Here and in the sequel we use square brackets [•,•] to denote the stan- 
dard inner product in M'^. It follows from the detailed balance equations 
(3.1) that A{p) is symmetric, but B{p) is not necessarily symmetric. Since 
"Yliy^x \^x,y{p)\ = ^x,x{p) > for all X ^ X, the matrix A{p) is diagonally 
dominant, which implies that 

[A{p)^,^]>Q (3.5) 

for all ^ G M'^. Note that 

A{p) = UBip) , 
where the diagonal matrix 11 G M'^^'^ is defined by 

n := diag(7r(3;))a;gA' 

Geometric interpretation. Before continuing we present another, more 
geometric reformulation of Definition 3.3 which makes the connection to the 
Benamou-Brenier formula 1.2 (even) more apparent. We introduce some 
notation that will be used throughout the remainder of the paper. 
For -0 G M"^ we consider the discrete gradient V'i/' G R'^^'^ defined by 

and for ^' G M'^^'^ we consider the divergence V • ^ G M"^ defined by 

(V • 'I')(x) ■■=IY1 ^) - ^(^' e M . 

It is easily checked that the "integration by parts formula" holds: 

where, for Lp,ip gR'^ and gR'^^'^ , 

(V,^>7r = ^ (^(x)V'(x)7r(x) , 



xex 
1 
2 



($,^)^ = ^ <^{x,y)'^{x,y)K{x,y)7r{x) . 

x,yeX 

Furthermore, for p G 3^{X) we write 

(^,^)p:=^ '^{x,y)'^{x,y)K{x,y)p{x,y)TT{x) , 

^.s/eA" (3.6) 



||$||p:=^(«I>,<I>),, 
and note that (•, ■)t^ = (•, ■)p if p{x) = 1 for all x £ X. 
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For a probability density p G ^{X) and x ^ X we consider the matrix 
p^^xxx (defined by 

p{x,y) := p{x,y) . 

Given two matrices M, N E M"^ ^ , let M • N denote their entrywise 
product defined by 

{M»N){x,y) := M{x,y)N{x,y) 
The definition of W can now be reformulated as follows: 
Lemma 3.5 (Geometric reformulation). For po,pi £ ,'^{X) we have 

Wipo,pi)^ =mfS^I^' \\ViJt\\%dt : ip,i;)eC£{po,pi)Y 

and the differential equation in (3.2) can be rewritten as 

Pt + V-{pt* y^t) = . (3.7) 

Proof. This follows directly from the definitions. □ 

For the L^-Wasserstein metric on Euclidean space, it is well known that 
one can take the infimum in the Benamou-Brenier formula (1.2) over all 
vector fields ^ : M" — )• M"", rather than only considering gradients ^ = Vijj. 
In order to formulate a similar result in the discrete setting, we replace [iv) 
and {v) in (3.2) by 

{iv') ^' : [0, T] M'^^'^ is measurable ; 

[v') For all X ^ X and a.e. t G (0,T) we have 

Pt{x) + \Y^ {^t{x,y) - '^t{y,x))K{x,y)pt{x,y) = ; 
yex 

and define 

C£'{po,pi) := {{p,"^) : (i), (ii), (Hi), (iv), (v) hold} . 
With this notation the following result holds. 
Lemma 3.6. For po^Pi £ ^{X) we have 

W{po,pif = ini{l [ V ^t{x,yfK{x,y)pt{x,y)TT{x)dt : 

{p,^)€C£'{po,pi)Y 

Proof. As the inequality ">" is trivial, it suffices to prove the inequality 
"<". For this purpose, fix p G ^(X) and let 'Hp denote the set of all 
equivalence classes of functions ^' G R'^^'^, where we identify functions that 
agree on {(x,y) ^ X ^ X : p{x,y)K{x,y) > 0}. Endowed with the inner 
product (•,-)p defined in (3.6), Tip is a finite-dimensional Hilbert space. 
The discrete gradient 'Vip{x, y) := •^{x) — <p{y) defines a linear operator 
V : L'^{X,7r) — t- Tip, whose adjoint is given by 

V;^(x) := ^ 5^ (^(^,y) - ^{y,x))K{x,y)pix,y) . (3.9) 
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Let Pp denote the orthogonal projection in Hp onto the range of V. 

Now suppose that ((pt), (^'t)) € C£^'(po, Pi) and let ^ : [0,1] M"^ be such 
that Ppt^t = Vi/'t for t G [0, 1]. In view of the orthogonal decomposition 

-Hp = Ran(V) 0^ Ker(V;j , (3.10) 

it follows that (/-PpJ^'t G Ker(V*J. This implies that V*^^'t = V*^(VV't), 
hence {p,ip) G C£{po,pi). Using the decomposition (3.10) once more, we 
infer that {Vipt,'^'^t)pt ^ {^t, ^t)pt^ from which the result follows. □ 

Remark 3.7 (Distance between positive measures). It is of course possible, 
and occasionally useful, to extend the definition of >V(po,Pi) to densities 
Pq, pi : X — )• M+ having equal mass m = YlxeX Pi(^)''^i^) ^ (0,c«) \ {1}. A 
straightforward argument based on Lemma 3.6 and the doubling property 
(A7) shows that 

cW{po,pi) < W{-po, -pi) < CW{po,pi) , 
m ni 

where the constants c, C > do not depend on pQ and pi. If (A7') holds, it 
follows that W{po,pi) = VrnWi^po, ^Pi)- 

Basic properties of the metric. The main result of this subsection reads 
as follows: 

Theorem 3.8. The mapping W : ^(Af) x ^(Af) — )• M defines a pseudo- 
metric on 0^{X). 

To prove this result we need some lemmas. 

Lemma 3.9. For pQ, pi G ^{X) and T > we have 

>V(po,Pi) = inf|^ [A{pt)i;t,i^t]-^ dt : {p,i;) e C£t{po, Pi) 

Proof. This follows from a standard argument based on parametrisation by 
arc-length. We refer to [1, Lemma 1.1.4] or [9, Theorem 5.4] for the details 
in a very similar situation. □ 

The next lemma provides a lower bound for W in terms of the total 
variation distance, defined for po,pi G ^{X) by 

dTv{po,Pi) = ^ T^{x)\po{x) - pi{x)\ . 

Lemma 3.10 (Lower bound by total variation distance). ForpQ,pi G ^{X) 
we have 



dTv{po,Pi) < v/2||^||ooW(po,Pi) , 

where 

ll^lloo = sup {0{s, t) : < s,t < [ min7r(x)) ^} . 

Proof. We assume that W(po,Pi) < oo, since otherwise there is nothing to 
prove. Let e > 0, let po,pi G ^{X) and take {p,ip) G C£{po,pi) satisfying 
"1 

[A{pt)i:t,H dt < W\po,Pi)+e. (3.11) 
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'^(f{x){po{x) - pi{x))7r{ 

'■I 



< 



1 \ 1/2 



[Uif, pt] dt 



[A{pt)v^,'iljt] dt 



1 \ 1/2 

[A{pt)ip,ip] dt 



where the appeal to the Cauchy-Schwarz inequahty is justified by (3.5). The 
latter integrand can be estimated brutally by 

[A{pt)ip,ip] = ^ {(fix) - ip{y)fK{x,y)pt{x,y)TT{x) 

x,y<^X 

loollv'IlL Yl ■^(^'y)^(^) = 2||^'llooll¥'llL > 

where we used the stationarity of vr to obtain the latter identity. Taking 
(3.11) into account, and noting that e > is arbitrary, we thus obtain 



2 

< 211 



\f\\ooyv{po,pi) ■ 



Using the duality between 1^{X) and 1°°{X), the result follows. 



□ 



Proof of Theorem 3.8. The symmetry of W is obvious, and Lemma 3.10 
implies that W{po, pi) > Q whenever po 7^ Pi- Finally, the triangle inequality 
easily follows using Lemma 3.9. □ 

Characterisation of finiteness. In the study of finiteness of the metric 
W, a crucial role will be played by the quantity 



Ce-r- 



1 



dr G [0, 00] 



/o V^(l-r,l+r) 

Note that C0 = ^/2ip{l), where denotes the function defined in (2.5) with 
p = q = 1. Therefore Cq is finite if and only if Dirac measures on the two- 
point space lie at finite W-distance from the uniform measure. Observe that 
< 00 if (A7') holds, since in that case 

e{l - r, 1 + r) > 61(1 - r, 1 - r) = (1 - r)e{l, 1) , 

for t G [0,1). 

The next result provides a characterisation of finiteness of the metric in 
terms of the support of the densities. For p G ^[X) we shall write 

suppp := {x ^ X : p{x) > 0} . 

Before stating the result we recall the following definition: 

Definition 3.11. Let p G ^{X). For x,y £ X we write 'x ~p y' if 
(i) x = y; or, 

(a) there exist k > 1 and xi, . . . ,Xk G X such that 

p{x,xi)K{x,xi),p{xi,X2)K{xi,X2), ... ,p{xk,y)K{xk,y)>0. 
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It is easy to see that for each p G ^{X), ~p defines an equivalence relation 
on X, which depends only on the support of p. Furthermore, if p is strictly 
positive, then x ~p y for any x,y £ X, since K is irreducible by assumption. 

Now we are ready to state the main result of this subsection. 

Theorem 3.12 (Characterisation of finiteness). 

(1) IfCe < oo, then >V(po,Pi) < oo for all po,pi £ B^{X). 

(2) IfCg = oo, the following assertions are equivalent for po, pi £ ^{X): 

(a) W{po,Pi) < oo ; 

(b) For any x £ X we have 

poivHy) = E p^ivMy) ■ (3-12) 

y^Po^ y^Pi^ 

Before turning to the proof of this result we record some immediate con- 
sequences: 

Corollary 3.13. Suppose that Cg = oo. For po,pi £ ^{X) the following 
assertions hold: 

(1) If yV{po,pi) < oo, then supppo = supppi. 

(2) If supp Po = supp pi = X , then yV{po, pi) < oo. 

Proof. (1) Suppose that po{x) = for a certain x £ X. In view of (A5) it 
then follows that x y for any y ^ x, hence by Theorem 3.12, 

Pi{x)'k{x) < ^ pi{y)-K{y) = ^ Po{y)'^{y) = Po(a;)vr(x) = . 

y^Pi^ y^PQ^ 

It follows that pi{x) = 0, which shows that supppo 5 supppi. The reverse 
inclusion follows by reversing the roles of po and pi. 

(2) If supppo = supppi = X, then x ~p- y for every y ^ x and i = 0,1 
by irreducibility. It follows that 

Po(y)vr(y) = 1 = X] P^(y)^(y) ' 

hence W(poiPi) < oo by Theorem 3.12. □ 

The proof of Theorem 3.12 relies on a sequence of lemmas of independent 
interest. 

First we prove two comparison results, which relate the pseudo-metric 
W on ^(X) to the pseudo-metric Wp,q on ^{y), where y = {a,b} is a 
two-point space endowed with the Markov kernel (2.1) with parameters p 
and q. 

Lemma 3.14 (Comparison to the two-point space I). Let a,b £ X be 

distinct points with K{a,b) > 0, and set p := K{a,b)-K{a). Suppose that 
Po,Pi £ ^{X) satisfy po{x) = pi{x) for all x £ X \ {a,b}. Consider the 
two-point space = {q, /3} endowed with the Markov kernel defined by 
K{a,f3) := K{f3,a) := p. For i = 0, 1, let pi : ^ R+ be defined by 

Pi{a) := 2pi{a)TT{a) , pi{f3) := 2pi{b)TT{b) . 

Then we have 

W(po,Pi) < yC^Wp,p(po,Pi) • 
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where is the constant from (A7). In particular, if (AT ) holds, then 

W(po,/Oi) < Wp,p(po,Pi) • 

Remark 3.15. Note that po and pi are not necessarily probability densities 
on {a, /S}, but they do have equal mass, since 

Pi{a)7r{a) + pj(/3)7r(/3) = pj{a)7r{a) + pj{b)ir{b) 

for i,j G {0,1}. Therefore Wp,p(po,/Ji) can be interpreted in the sense of 
Remark 3.7. 

Proof of Lemma 3.14- Let e > and take (p, ■ip) G C£{po,pi). It then follows 
that 

A(a) + (M^) - Ma))K{a, ^)pt{a, /3) = , 
pt{f3) + {Ma)-i^tmK{(3,a)pt{a,f])=0. ^^'^^^ 
For t G (0, 1) define pt G by 

Pt(a) := ^ , Pt{b) := ^ , := po{x) , 

for X G Af \ {a, b}. Furthermore, we define : A' x — )• M by 

Ma,b) := -^i(6,a) := - V^tH) l{p,(a,fe)>o} , 
:= , 

for all other values of x,y G X. Using (3.13) it then follows that (p, ^) G 
C£' {pq, pi). Using Lemma 3.6 we thus obtain 

W(po, Pi)' < "ftia, bfptia, b)K{a, 6)^(a) dt 
Jo 

= \ I (Ma) -Ml3)Y ^^^^'^l l{p,(a,b)>o}K{a,^)TT{a) dt . 
^ Jo Ptyo-, 0) 

Using (A6) and (A7) we infer that 

Ptia,/3) = e{27Tia)ptia),27T{b)pt{b)) < 2C ^0 {pt{a) , pt{b)) = 2CdPt{a,b) , 
which yields 

W(po,Pi)' < Cd / {Ma)-Mf3)fptia,l3)Kia,/3)n{a) dt . 
Jo 

Minimising the right-hand side over all {p,ij) G C£{po, pi), the result follows. 

□ 

Lemma 3.16 (Comparison to the two-point space II). Let po, pi G ^{X) 
and set /3i{x) = 1 — 2pi{x)TT{x) for i = 0, 1 and x £ X. Then the bound 

W(po,Pi) > csupWi,i(/o("),p^i(")) 

holds, for some c > depending only on K , tt and 9. If (AT ) and (A 8) 
hold, then 

W(po,Pi) > supWi,i(p'^°("),p'^i(")) . 
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Proof. First we shall prove the result under the assumption that (A7') and 
(A8) hold. Fix o £ X and let y = {a, b} be a two-point space endowed with 
the Markov kernel (2.1) with p = q = 1. For p G J^{X) and G R"^ we 
define, by a slight abuse of notation, p G ^{y) and ip G M-^ by 

p{a) := 2p(o)7r(o) , p{b) := 2 ^ /)(x)7r(x) , 



V(a) := , m : = 



In the definition of tp{b) we use the convention that 0/0 = 0. Observe that 
p indeed belongs to ^{y) since 7r(a) = 7r(6) = | and p{a) + p{b) = 2. We 
set p{a,b) := 27r(o) ^^^^ ii'(o, x)p(o, x) and claim that 

p(a, 6) < /9(a, , (3.14) 
[A{p)^P,^P] > ^(V(a) - mfpia,b) • (3.15) 

In the proof of both claims we shall assume that p{a, b) > 0, since otherwise 
there is nothing to prove. To prove (3.14), note first that for any x £ X 
with K{o,x) > 0, 

7r(x) K{o,x) 

^-T = -FT-f 7 > K{o,x) . (3.16) 

Using this inequality together with (A6), (A7') and (A8), 



p(a, b) = e{ 2p(o)7r(o) , 2 />(x)^(x 

= 27,{o)e(p{o),Y.p{xf-^ 

>27r(o)e('p(o),J^i^(o,x)p(x) 
> 27r(o) K(o, x)0(p(o), p(x)) = p{a, b) , 

which proves (3.14). 

To prove (3.15), write k{x) := K{o,x)p{o,x) for brevity and note that 
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Using the detailed balance equations (3.1) in the first inequality, we obtain 



2 

x,y£X 



> Y.{tp{o) - i^{x)fK{o,x)p{o,x)7r{o) 



^(o)2 k{x) - 2V'(o) ^{x)k{x) + Y ij{x)^k{xyjTr{o) 

Xy^O XJ^O XJ^O 

> ^i;{afpia,b) - ^(a)^(6)p(a, 6) + ^^P{bfp{a,b) 

= ^{^P{a)-^p{b)fp{a,b), 

which proves (3.15). 

Take (p, ^) G C£{po,pi)- Since 

Pt{o) + ~ ipt{o))K{o,x)pt{o,x) = , 

x^o 

it follows that 

ptia) + (Mb) - Ma))pt{a, b) = . (3.17) 

Set /3f := 1 — 2/?t(o)7r(o) for t G [0,1] and note that /3t = if pt{a,b) = 0. 
Using (3.15), (3.17), (3.14) and Lemma 2.3 we obtain 

1 /"^ 

[Aipt)iljt,iJt] dt>- (Ma) - Mb)?Pt{a,b) dt 
^ Jo 

1 /5?l{pt(a,fe)>0} ^ 1 $H{ptXa,b)>0} 



2 Jo Pt{a,b) 2 Jq Pt{a,b) 

> Wi2i(p*,/i). 
Taking the infimum over all pairs {p^ip) G C£{pQ, pi), we infer that 

yv\po,pi)>wl,{p^'>,p^^). 

The result follows by taking the supremum over o £ X. 

Finally, without assuming (A7') and (A8), the same argument applies, 
if one replaces (3.14) by the following estimate, which uses the doubling 
property (A7), (3.16) and (A5): 

p{a, b) = 27r(o) Y K{o, x)e{p{o), p{x)) 

x)iT{o),2p{x)K{o, x)7r{o)) 
<C^0(2p(o)^(o),2p(x)7r(x)) 

< C|Af|0(2p(o)^(o),2^/3(x)7r(x) 
= C\X\p{a,b) . 
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□ 

The next lemma provides a useful characterision of the kernel and the 
range of the matrices A{p) and B{p). 

Lemma 3.17. For p G ^{X) we have 

Ker j4(p) = Ker B(p) = G M"^ | ^l^{x) = ^^{y) whenever x ~p y} , 

RanA(p) = jV' G M'^ I Vx G Af : ^ i{;{y) = o} , 

Ran B{p) = G M'^ | Vx G A" : ^ i^{y)^{y) = o} . 

Proof. Recall that (A3) and (A5) imply that p{x, y) = whenever p{x) = 
or p{y) = 0. Therefore the assertions concerning A{p) follow directly from 
Lemma A.l. Since B{p) = Il~^A{p), one has 

Ker B{p) = Ker A{p) , Ran B{p) = Ran A{p) , 

hence the remaining assertions follow as well. □ 

For a G ^{X) and a > we shall use the notation 

^^{X) := {p G ^{X) I Vx G A" : (3.12) holds with po = p and pi = a; 

G supp((t) : p{z) > a} . 

Lemma 3.18. For p G ^(X), B[p) restricts to an isomorphism from 
R&nA{p) onto RanB[p). Moreover, for a G I^{X) and a > there ex- 
ist constants < c < C < oo such that the hound 

c\m < \\B{p)i,\\ < c\m (3.18) 

holds for all p G .^^^{X) and all ^ G Ran{a). 

Proof. Since A[p) is self-adjoint, A{p) restricts to an isomorphism on its 
range. Since 11 is an isomorphism from Ranj4(p) onto RanB{p) and B{p) = 
I\.~^A{p), the first assertion follows. 

Lemma 3.17 implies that Ran A{p) = Ran A{a) and Ran i?(p) = RanB{a) 
for all p G ^^(j{X). Thus B{p) restricts to an isomorphism, denoted by Bp, 
from Ran A(cj) onto Rani?(cj). Since the mapping ^^{X) B p ^ \\B~^\\ is 
continuous w.r.t. the euclidean metric and strictly positive, the lower bound 
in (3.18) follows by compactness. The upper bound is clear, since the entries 
of B{p) are bounded uniformly in p. □ 

The next result provides a partial converse to Lemma 3.10. 

Lemma 3.19. Fix a G ^{X) and a > 0. There exist constants < c < 
C < oo such that for all po,Pi G ^ai"^) '"^^ have 

cdTv{po,Pi) < W(po,Pi) < CdTv{po,Pi) ■ 

Proof. Since the lower bound for W has been proved in Lemma 3.10, it 
remains to prove the upper bound. 

For t G [0, 1] set pt := (1 - t)pQ + tpi and note that pt G ^^(X). Since 

Pt = Pi — Po & RanB{pt) = Rani3(cr) 
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by Lemma 3.17, Lemma 3.18 implies that, for each t G [0, 1], there exists a 
unique element ^pt £ RanA(pt) satisfying 

Pt = B{pt)'4)t . 
Moreover, Lemma 3.18 implies that 

m\<c\\pi-po\\ 

for some constant C > that does not depend on po, Pi and t. It thus 
follows that 

W{po,pif < [\A{pt)i^t,i^t] dt < C^C'Wpi - pof < C^C'C"dly{po.pi) , 

JO 

where C := suppg^(_;f) ||yl(p)|| < oo and C" > depends only on n. □ 

Now we are ready to prove the main result of this subsection. 

Proof of Theorem 3.12. Since K is irreducible, (1) follows from Lemma 3.14, 
Remark 3.7 and the triangle inequality for W. 

The implication (6) (a) of (2) follows from Lemma 3.19. 

In order to prove the converse implication, we take po,Pi £ ^^'{X) with 
W{pq,Pi) < oo and claim that supppo = supppi. Indeed, if the claim were 
false, then there would exist x £ X with po{x) = and pi{x) > (or vice 
versa). Set (3 = 1 — 2Tr{x)pi{x) and note that /? G [—1,1). Lemma 3.16 
implies that yV{po,pi) > cyVi^i{p^, p^) for some c > 0. Since Cq = oo, the 
right-hand side is infinite, which contradicts our assumption and thus proves 
the claim. 

Let (/9, V') G C£{po,pi) with Jj|[yl(pj), -i/'t, ^t] dt < oo. The claim im- 
plies that supp/90 = supppf for all t G [0, 1] and therefore x y if and 
only if X ~pg y. Fix z G supppo and take x £ X with x z. Since 
K{x,y)pt{x,y) = whenever y T^p^ z, we have 

Pt{^) + X] (^*(^) ~ M^))K{x,y)pt{x,y) = . 

Multiplying this identity by 7r(x) and summing over x £ X with x ~pj z, it 
follows using the detailed balance equations (3.1) that 

Pt{x)7T{x) = , 

which implies (3.12). □ 

Remark 3.20. Alternatively, the implication (6) =^ (a) in the proof of The- 
orem 3.12 can be proved as an application of Lemma 3.14. 

We continue to prove the remaining parts of Theorem 1.1. 

Theorem 3.21 (Topology). Let a G ^{X). For p,pa G ,^a{X), the fol- 
lowing assertions are equivalent: 

(1) limdTv{pa,p)=0; (2) limW{pa,p)=0. 
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Proof. It follows from Lemma 3.10 that (2) implies (1). 

Conversely, suppose that (1) holds. If Cg < oo, then (2) follows easily 
using Lemma 3.14. If Cg = oo, there exists an index a and a constant 6 > 
such that p and pa belong to B^^{X) for every a> a. Lemma 3.19 implies 
then that there exists a constant C > such that 

yV{pa,p) < CdTviPa,p) 

for all a > a, which yields the result. □ 

Theorem 3.22 (Completeness). For every a G ^{X) the metric space 
{^a{X),W) is complete. 

Proof. If Cg < oo, this follows directly from Lemma 3.10 and Theorem 3.21. 
If Cg = oo, take a sequence {pn)n in which is Cauchy with respect 

to W. In particular, {pn)n is bounded in the W-metric, hence by Lemma 
3.16 there exists a constant a > such that pn belongs to ^^{X) for every 
n. By Lemma 3.10 {pn)n is Cauchy in the total variation metric, hence pn 
converges to some p E ^{X) in total variation. Since ^^{X) is a dry-closed 
subset of ^{X), it follows that p belongs to ^^{X). From Theorem 3.21 
we then infer that pn converges to p in W-metric, which yields the desired 
result. □ 

Riemannian structure. Fix a probability density a G ^{X) and consider 
the space 

^U^) := [p G ^{X) \ yxeX : p{y)7r{y) = ^ a{y)7r{y)] . 

Note that ^[{X) = ^^:{X) where 1 denotes the uniform density with re- 
spect to vr. Moreover, if Cg = oo. Theorem 3.12 implies that ,'^'^{X) = 
for all a£ ^{X). 

Our next aim is to show that the metric space [l!P„{X),yV) is a Riemann- 
ian manifold. First, we have the following result: 

Proposition 3.23. The metric space {^'^{X),W) is a smooth manifold of 
dimension 

d{a) := I suppcr| — n{a) , 

where \ suppcr| is the cardinality o/ supper, and n(a) is the number of equiv- 
alences classes in the support of a for the equivalence relation 

Proof. It follows from Theorem 3.12 and Lemma 3.17 that I^'^{X) is a rel- 
atively open subset of the affine subspace 

Sa := a + Rsin B{a) C M.^ . 

Theorem 3.21 implies that the topology induced by W coincides with the 
euclidean topology on ^'^{X), hence {^'^{X),W) is a smooth manifold. 

The assertion concerning the dimension follows immediately, since d{a) 
is the dimension of Ran i?(cj). □ 

Fix a E ^(X) and p G Since is an open subset of 

the affine space a + Rani?(cr), the tangent space of ^'^{X) at p can be 
naturally identified with Rani?(cr) = Rani?(p). Our next aim is to show 
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that the tangent space can be identified with a space of gradients, in the 
spirit of the Otto calculus developed in [23]. In fact, we shall construct an 
isomorphism Xp from Rani?(cr) onto 

:= {V^ e M'^^'^ : ^ G Ran^(p)} . 
Remark 3.24. Note that if p belongs to we have 

Tp = {V^ £ R'^''^ : V e ■ 
However, it is easy to see that this is no longer true if /3 ^ ^^{X). 
Proposition 3.25. Let p G 3^'^{X). The mapping 

Xp : Ran B{a) Tp , ^ Vil^ 

defined for ip £ Ran^(p), is a linear isomorphism. 

Proof. To show that Xp is well-defined, consider the following mappings: 

Fp -.Ran A{p) ^ Ran B{p) , ip ^ B {p)'il^ , 

G -.Ran A{p) ^Tp , ip^Vip. 

We claim that Fp and G are linear isomorphisms. Once this has been estab- 
lished, the proposition follows at once. The claim for Fp has been proved 
in Lemma 3.18. To prove the claim for G, suppose that Vip = for some 
ip G Ran(A). It then follows that 

[A{p)il;,iP] = (V^,V^)p = 0, 

Since A{p) is symmetric and ip G Ran^(p), it follows that 1^ = 0, which 
completes the proof. □ 

The following statement clarifies the connection with the Otto calculus in 
the continuous setting: 

Proposition 3.26. Let p : [0,1] -5- ^U^) be differentiable at t G [0,1]. 
Then Xp^pt is the unique element Vipt G Tp^ satisfying the identity 

Pt + V-{pt* ViPt) = . 

Proof Since B{p)ip = -V • (p • Vip) for p G ^{X) and ip G M"^ , this is an 
immediate consequence of Proposition 3.25. □ 

Henceforth we shall identify the tangent space of ^^(Af) at p with Tp by 
means of the isomorphism Xp. 

Definition 3.27. Let p G ^'^{X). We endow Tp with the inner product 

defined for (p,-ip £ Ran A{p) . 

Note that, for p G ^U'^) and (p,^p £ Ran^(p) 

{Vip,ViP)p = [A{p)ip,ij] . (3.19) 

Remark 3.28. It is clear from the definition that {'Vip,'V'ip) p is well-defined. 
Moreover, (3.19) implies that if {'V^p, 'Vip)p = for some ip £ Ran A{p), then 
■0 = 0, thus the expression indeed defines an inner product on Tp. 
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Theorem 3.29. The following statements hold: 

• If Cg < oo and (A8) holds, then W) is a Riemannian man- 
ifold. 

• If Ce = oo, then {^'^{X),yV) is a complete Riemannian manifold 
for every a G ^(X). 

The Riemannian metric is given by Definition 3.27. 

Proof. Suppose first that Cg = oo. Then Proposition 3.23 asserts that 
yy) is a smooth manifold and the completeness has been proved in 
Theorem 3.22. The result would follow immediately from Lemma 3.5 and 
Definition 3.27, if we were allowed to add the following requirements to the 
definition of C£{po,pi) without changing the value of yV{po,pi). 

(i) ptG^aWior alHG[0,l]; 

(ii) ipt G Ran A{pt) for all t G [0, 1]. 

But (i) may be added by Theorem 3.12 and (ii) may be added in view of 
the orthogonal decomposition X = Yla.nA{p) © Ker ^(p). 

If Ce < oo the same argument applies, with Lemma 3.30 below providing 
the analogue of (i). □ 

The next result asserts that in the definition of W, only curves consisting 
of strictly positive densities need to be considered if the endpoints are strictly 
positive as well. 

Lemma 3.30. Suppose that (A8) holds. For po,pi G ^^{X), we may 
replace (Hi) in Definition 3.3 by "{Hi') : pt G for all t G [0, T]". 

Proof. For notational reasons, let us write 

^(p,^) := ||^||2 = ^ ^{x,yfK{x,y)p{x,y)7T{x) 

for p G and ^ G R^""^ . Let < e < 1 and let {p, ^) G C£'{po,pi) be 

such that 



/ A{pt,'^t)dt<W\po,Pi)+e . 
Jo 



We set yof = (1 — e)pi + e for i = 0, 1. 

Firstly, we define (p^,^^) G C£'{pQ,p^) by 

Ptix) ■■= {l-e)pt{x)+e , 

The concavity assumption (A8) implies the convexity of the function 

M X E+ X M+ 9 (x, s, t) ^ — ^— , 

which yields 

/' A{pt, dt < (1 - e) /' Aipt, ^t) di < (1 - e)W\po, pi) + e . 
Jo Jo 
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Secondly, for i = 0,1, we define (p*'^, ^''''^) £ C£'{pi,pf) by linear interpo- 
lation, i.e., 

pi'' := (1 - t)pi + tpl . 

As in the proof of Lemma 3.19, for t £ (0, 1), let ipl'^ be the unique element 
in Ran A{pI''') satisfying p]'' = B{pl'')ipl'' . Setting ^f*'^ := V^*'*^, it then 
follows that (p*'^,^*'^) G C£'{pi,pl). Lemma 3.19 and its proof imply that 
there exists a constant C > 0, independent of e > 0, such that 

f A{(^f,^r) dt < Cdly{p,,f^) < 4Ce2 . 

JO 

Finally, it remains to rescale the three curves in time and glue them 
together. We thus define 

[pt^^-'Kfe) ^ *e[o,.], 

(/'(t-.)/(i-2.)' (1 - 2e)-^^ft_,)/(i_2,)) , t£{E,l-e), 
W(^^t)/e^^''^'it-t)/e) ^ [1-^,1], 

SO that (p^, ^f^) G C£{po,pi). We infer that 

s 1 2^ s 



1 - 2e 

Since the right-hand side tends to W^(pO)Pi) as e — > 0, the result follows 
from the observation that may be replaced by Ppe'I'f, as in the proof of 
Lemma 3.6. □ 

In the next result we will slightly abuse notation and write 

dip{x,y) := die{p{x),p{y)) . 

Theorem 3.31 (Geodesies). Suppose that Cq = oo and let a S ^^{X). The 
following assertions hold: 

(1) For each po,pi S ^a{X) there exists a constant speed geodesic p : 
[0, 1] — )• ^[X) with po = po and pi = pi- 

(2) Let p : [0,1] — t- ^„{X) be a constant speed geodesic and let ipt = 
IptPt- Then the following equations hold for t £ [0, 1] and x G X : 



dtPtix) = "^{iptix) - Tpt{y))K{x,y)pt{x,y) , 

ft^ 2 (3-20) 

dti/jt{x) = 2 2^ {Mx) -i^tiy)) K{x,y)dipt{x,y) . 



Proof. Since {^„{X),W) is a complete Riemannian manifold, (1) follows 
from the Hopf-Rinow theorem. The equations in (2) are the equations for 
the cogeodesic flow (see, e.g., [15, Theorem 1.9.3]) and follow directly from 
the representation of W as a Riemannian metric given in this section. □ 
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Remark 3.32. The equations (3.20) should be compared to the geodesic 
equations for the L^-Wasserstein metric over MJ^ (see [3], [23], [24]), which 
are given under appropriate assumptions by 

^tp + V■ipV^P) = 0, 

9tV + i|V^P = 0. ^"^-^^^ 

The equations (3.20) are a natural discrete analogue of (3.21). Note however 
that the equations for ■0 in the discrete case depend on p. 



4. Gradient flows of entropy functionals 

We continue in the setting of Section 3, where K is an irreducible and 
reversible Markov kernel on a finite set X. We fix a function 9 : ]R_|_ x M+ — )■ 
M+ satisfying Assumption 3.1 and consider the associated (pseudo-)metric 
defined in Section 3. If < oo, we shall also assume that (A8) holds. 

Since ^^:{X) is a Riemannian manifold, as has been shown in Theorem 
3.29, we are in a position to study gradient flows of smooth functionals 
defined on ^^{X). Let 

A:= K -I 

denote the generator of the continuous time Markov semigroup {e^^)t>o 
associated with K. The main result in this section is Theorem 4.7, which 
asserts that solutions to the "heat equation" pt = Apt are gradient flow 
trajectories of the entropy Ti with respect to the metric W. 

Notation. In view of Proposition 3.25, we shall always regard Tp as being 
the tangent space of ^^{X) at p G ^*(Af). The tangent vector field along 
a smooth curve t pt £ ^*(Af) will be denoted by 

t^Dtpe Tp, . 

The gradient of a smooth functional Q : ^^{X) — )• M at p G ,'^^{X) is 
denoted by 

grad^(p) G Tp . 

Functionals. We shall consider the following types of functionals: 

• For a function V : X ^Mwe consider the potential energy functional 
V: ( A') defined by 

• For a differentiable function / : (0, oo) — )• M, we consider the gener- 
alised entropy T : ^^{X) — )• M defined by 

Hp) ■■= E fipi^)M^) ■ 

Proposition 4.1 (Gradient of potential energy functionals). The functional 
V : S^^(X) — )• M is differentiable, and for p G =f^*(A') we have 



gradV(p) = VV 
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Proof. Clearly, V is differentiable. Let t ^ pt £ be a differentiable 

curve and let ipt S Kan A(pt) be such that VV't := Dtp. Then 

^Viut) = Vix)pt{x)7r{x) = V{x){B[pt)i^t)[x)T:{x) 

= -(y, V • {pt . vvt)). = (vy, pt . vvt). = (vy, vvt)p, , 

which yields the result. □ 

Proposition 4.2 (Gradient of generalised entropy functionals). The func- 
tional J- : — )• M is differentiable, and for p £ 3^^{X) we have 

grad J"(p) = V(/' op) . 

Proof. The differentiability of J- is clear from its definition. Let t pt £ 
be a differentiable curve and let ipt G Ran A(/9t) be such that Vipt '■= 
Dtp. Since / is differentiable, we obtain 

^^Hui) = Y f\pt{x))pt{^)n{x) = Y f'{pt{x)){B{pt)^t){x)7:{x) 

x£X x&X 

= -{f'{pt),v ■ {pt • vvt)>. = (v/'(pt), Pt . vV't). 

= (V/'U),V^i)p, , 
which yields the result. □ 

In the special case where J- =% \s the entropy functional from (1.1) we 
obtain: 

Corollary 4.3. The functional % : =?^*(A') — ?• M is differentiable, and for 
p G l3^^,{X) we have 

grad'H(p) = Vlogp . 
Proof. This follows directly from Proposition 4.2. □ 

Gradient flows. In order to study gradient flows, we impose the following 
assumption which will be in force throughout the remainder of this section. 

Assumption 4.4. In addition to Assumption 3.1 we assume: 
(A9) There exists a function k G C^((0, oo);M) such that 

for all s,t > with s ^ t. 

Recall that this assumption is satisfied if 9 is the logarithmic mean, in 
which case k{t) = log(t). 

Proposition 4.5 (Tangent vector field along the heat flow). Let p G J^{X) 
and let pt = e^^p, t > denote the heat flow. Then t ^ pt is C°° on (0, oo) 
and for t > we have 

Dtp = -V(/c o Pt) . 
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Proof. The differentiability assertion follows from general Markov chain the- 
ory. For any p G ^^.(Af), we have 

^^"^'^^ kipix)) - k{p{y)) ' 

and therefore 

Ap = V • (Vp) = V • (p • V(A; o p)) . 

Since t ^ pt solves the heat equation pt = Apt, it follows that 

Pt-V-{pt»V{kopt)) = , 

hence Dtp = — V(A; o pt) by Proposition 3.26. □ 

We slightly modify the usual definition of a gradient flow trajectory, as 
we wish to allow for initial values that do not belong to 

Definition 4.6 (Gradient flow). Let T : ^^{X) ^ R be dijferentiable. A 
curve p : [0, oo) — )■ is said to be a gradient flow trajectory for T 

starting from p £ if the following assertions hold: 

(1) t ^ Pt is differentiable on (0, oo), for every t > we have pt E 
^*(A') and 

Dtp = -gr ad T{pt) . 

(2) t ^ Pt is continuous in total variation at t = and pQ = p. 

Theorem 4.7. Let f G C2((0,oo);M) be such that f = k and let p G ^{X). 

Then the heat flow 1 1— t- e^^p is a gradient flow trajectory for the functional 
T with respect to W. 

Proof. The first condition in Definition 4.6 is a consequence of Propositions 
4.2 and 4.5. The second one follows from general Markov chain theory. □ 



Corollary 4.8 (Heat flow is gradient flow of the entropy). Let 6 be the 

logarithmic mean defined by e{s,t) = £ s^'^tP dp and let p G ^{X). Then 
the heat flow t i— >■ e*^p is a gradient flow trajectory for the entropy % with 
respect to W. 

Proof. This is a special case of Theorem 4.7 with k{t) = 1 + logt and f{t) = 
tlogt. □ 

Appendix A. A result from the theory of diagonally dominant 

MATRICES 

The following result from the theory of diagonally theory is a special case 
of [8]. For the convenience of the reader we present a simple proof. 

Lemma A.l. Let A = (ajj)jj=i^...^„ be a real matrix satisfying 

(1) Vi : aii>0, (2) Vi / j : Oij = aji < , (3) Vi : ^ aij = . 

j 

Consider the equivalence relation ~ on I = {1, . . . ,n} defined by 
( i = j , or 

I 3/c > 1 3ii, ...ikGl : 0,,,^ , 0,^,^2 , • • • , j < , 
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and let {Ia)a ^ I denote the corresponding equivalence classes. Then the 
following identities hold: 

Ker^ = {{xi) G M" | Xi = xj whenever i ~ j} , (A-1) 
RanA = G M" I Va : ^ = o} . (A.2) 

Proof. First we remark that the assumptions (1) ~ (3) imply that Ojj = if 
i £ la and j £ for some a ^ (3. Furthermore, it suffices to show (A.l), 
since (A.2) then fohows by duahty. 

To show "5", suppose that x = (xj) satisfies Xi = xj whenever i ~ j. Fix 
k £ I and take /3 such that k £ Ip. Using the remark and (3), it fohows that 

^ ^ O^kjXj — ^ ^ O-kjXj — Xk ^ ^ Q-fcj — Xk ^ ^ dfej — , 

which yields the desired inclusion. 

Conversely, to show "C" , we use the identity 

'^X'ij — Xj^ ~t~ Xj ijXi ) 

to write, for x = (xj), 

2{Ax,x) = 2Y, 

= Oij + Xj ajj — aij{xi — xj) . 

i€l j&I j€l i€l ijel 

Using (3) and the symmetry of A we infer that 

{Ax,x) = — — ^ ^ aij{xi ~ Xj) . 

Consequently, if Ax = 0, it follows that {Ax, x) = 0, hence Xj = Xj whenever 
i ^ j, which completes the proof. □ 

Appendix B. Uniqueness of the metric on the two-point space 

In this appendix we shall prove Proposition 2.13. First we need two 
definitions. Let (M , d) be a metric space. 

Definition B.l. Let I M be an interval and let 1 < p < oo. A curve 
^ : I ^ M is said to he p-absolutely continuous if there exists a function 
m £ LP{r, R) such that 

d{l{s),j{t)) < / m{r) dr 



for all s,t £ I with s <t. The curve 7 is locally p-absolutely continuous if 
it is p-absolutely continuous on each compact subinterval of I. 

We shah use the notation 7 £ ACp{T,M) and 7 £ ACf^^{T,M) respec- 
tively. 

The following notion of gradient fiow in a metric space (M, d) has been 
studied in great detail in [1]. 
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Definition B.2. Let : Af — )■ RU {+00} be lower- semicontinuous and not 
identically +00. A curve 7 G C([0, oo);M) n ^Cj^^((0, 00); M) is said to 
satisfy the evolution variational inequality {E\fl\{F)) if, for any y G D(-7^), 
the inequality 

]^^^d\^{tU) + ^d\^{t),y)<F{y)-F{^{t)) (B.l) 
holds a.e. on (0,oo). 

Proof of Proposition 2.13. Recall that 13 = 2-^2, Let /3 G {$, 1) and suppose 
that there exists a G (—1, 1) such that 

M{p^p^) = Mip^pn + /) . (B.2) 

We claim that a G /3]. To prove this, suppose first - to obtain a contra- 
diction - that a > /?. Then there exists T > such that e^^^~^^p" = p^ , 
hence (B.l) implies that 

M{p^,p^f -M{p'^,pi^f < 2T{n{p^) - nipf^)) < . 

In view of (B.2), it follows that A4{p'^,p^) = 0, thus a = /3, which con- 
tradicts the assumption. Suppose now that a < /S. Adding (B.2) and the 
inequality in (2) we infer that p° = pl^ , hence a = /3, which proves the claim. 

Now, fix /3 G 1) and let 1 1-^ p"^'^^^ be a speed-1 geodesic with V'(O) = (3 
and ij{T) = /3 where T = M{p^,p^). For < s < i < T we then have 
_A4(/,p'/'{*)) = A4(/,/3'^(^)) -F7W(/9'^(^),/>'^W), thus the claim implies that 
V'(s) < ^(t). Since ^/^ is a geodesic, we have ^^{s) / '!/'(*)) thus -0 is strictly 
increasing on [0, 1]. 

Now we claim that ip is continuous on [0, T]. To show this, take t G (0, T). 
Since ip is increasing, the limits ip{t—) and V'(^+) exist and for any e > we 
have A^(p'^(*-),p^(*+)) < >/(/(*-^), /(*-^)) = 2e, thus ^(t-) = ^(t+). A 
similar argument shows that ^ is continuous at and T, thus ip is continuous 
on [0,r]. Since is continuous and strictly increasing we infer that the 
mapping : [0, T] — )• f3] is surjective. As a consequence, the inverse 
mapping 93 : [13, /3] — )• [0, T] is well-defined, and continuous and strictly 
increasing as well. 

Note that the mapping 

I:t^ (B.3) 

defines an isometry from [0, T] endowed with the euclidean metric onto {p" : 
a G [0,/3]} C ^^(Af) endowed with the metric A4. The inverse mapping is 
given by 

J : p" ^ ip{a) . 

Since n : t 1— t- p^* is a 2-absolutely continuous curve satisfying EVIo('H) for 
the metric A4, (B.3) implies that the mapping 

t ^ u{t) := J{u{t)) = ^p{(3t) 

is a 2-absolutely continuous curve satisfying EVIo(^) where H := Ho I, for 
the euclidean metric. It follows that the mapping ip : [/3, /3] — )■ [0, T] itself is 
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absolutely continuous, hence almost everywhere differentiable, and the same 
holds for its inverse ip. Moreover, the identity 



holds for a.e. a G [;^,/3]. 
For any a € [f3, /3] we have 

' p + ql — a 



1 



(B.4) 



p + q 



p + q 



p + ql + a 



thus, for r € {0,T), 

n{r) 



p + q 



p + ql - tp{r) 



+ 



p + q 



p + ql + il){r) 



p 



It follows that % is a.e. differentiable and the identity 



-H\r) 



i^'ir)l ffp + ql + '4>{r)\ ,, ( P + ql - i^{r) 



(B.5) 



P 2 

holds a.e. _ 

Since 1 1— )• u{t) is a 2-absolutely continuous curve satisfying EVIo(^) and 
since the functional Ti is differentiable a.e., it follows from [1, Proposition 
1.4.1] that the gradient flow equation 

u'{t) = -n'mt)) 

holds almost everywhere. 

Since 93 is differentiable a.e., the left-hand side equals a.e. 

u'{t) = ^^ip{(3t) = (p(l - A) - q{l + f3t))^'{(3t) . 

Taking (B.4) into account, it follows from (B.5) that the right-hand side 
equals a.e. 

Combining the latter two inequalities we infer that for a.e. a G 

1 



/'(p^* (&))-/'(/* (a)) 



{q{l + a)-p{l-a))^'ia) 
Since if is absolutely continuous, 



f'{p%h))-f'{p%a)) 



(p'{a) da 



da 



V 2(g(l + a)-p(l-a)) 

hence, since t 1— )■ ^(t) is a geodesic, we obtain for /3 < a < f3, 

M{p'^,p^) = A^(p'^M")),/M/3))) = C(v?(/3) - 99(a)) . 

Thus the distance between and p^ is uniquely determined for all a, /3 > 
j3. The same argument shows that the distance is uniquely determined for 
Oi,P < P- The case a < ^ < (3 follows from the assumption (2). □ 
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